seed voice conversion

zero-shot voice conversion and zero-shot singing voice conversion

https://github.com/Plachtaa/seed-vc

dependecies

!git clone https://github.com/Plachtaa/seed-vc.git
Cloning into 'seed-vc'...
remote: Enumerating objects: 907, done.
remote: Counting objects: 100% (280/280), done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 907 (delta 200), reused 211 (delta 174), pack-reused 627 (from 2)
Receiving objects: 100% (907/907), 66.32 MiB | 17.72 MiB/s, done.
Resolving deltas: 100% (449/449), done.
%cd seed-vc
!pip install -r requirements.txt
!pip install --upgrade protobuf===5.29.3
#run if kernel restarted
%cd /content/seed-vc/
/content/seed-vc

singing voice conversion

download songs

!mkdir input
!wget -O input/japanese.wav https://plachtaa.github.io/seed-vc/demos/references/teio_0.wav
!wget -O input/seeyouagain.wav https://huggingface.co/spaces/Plachta/Seed-VC/resolve/main/examples/source/Wiz%20Khalifa%2CCharlie%20Puth%20-%20See%20You%20Again%20%5Bvocals%5D_%5Bcut_28sec%5D.wav
!wget -O input/trump.wav https://plachtaa.github.io/seed-vc/demos/references/trump_0.wav
!wget -O input/ref_song.wav https://plachtaa.github.io/seed-vc/demos/sources/%E4%B8%96%E7%95%8C%E8%BF%98%E5%B0%8F.wav
!wget -O input/dingzhen.wav https://plachtaa.github.io/seed-vc/demos/references/dingzhen_0.wav

run conversion

# use webui
!python app_svc.py  --fp16 True --share True
# or use command line
# --diffusion-steps 100 for best quality
# --f0-condition True for SVC
!python inference.py --source /content/seed-vc/input/ref_song.wav --target /content/seed-vc/input/japanese.wav --output output \
    --diffusion-steps 100  \
    --length-adjust 1.0 \
    --inference-cfg-rate 0.7 \
    --f0-condition True \
    --auto-f0-adjust False \
    --semi-tone-shift 0  \
    --fp16 True
2025-04-23 14:36:13.564924: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1745418973.581998   15926 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745418973.587121   15926 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-23 14:36:13.609608: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Warning: Skipped loading some keys due to shape mismatch: {'estimator.conv2.bias', 'estimator.res_projection.bias', 'estimator.input_pos', 'estimator.t_embedder2.mlp.2.bias', 'estimator.conv2.weight', 'estimator.t_embedder2.mlp.0.weight', 'estimator.t_embedder2.mlp.2.weight', 'estimator.conv1.weight', 'estimator.conv1.bias', 'estimator.res_projection.weight', 'estimator.t_embedder2.mlp.0.bias'}
cfm loaded
length_regulator loaded
Loading weights from nvidia/bigvgan_v2_44khz_128band_512x
Removing weight norm...
It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
100% 100/100 [00:24<00:00,  4.04it/s]
100% 100/100 [00:08<00:00, 11.40it/s]
RTF: 1.4874707420275481

display results

# example 1
import os
from IPython.display import Audio
print("target song\n")
display(Audio("/content/seed-vc/input/seeyouagain.wav"))
print("reference voice\n")
display(Audio("/content/seed-vc/input/trump.wav"))
print("converted voice\n")
display(Audio("/content/seed-vc/output/vc_seeyouagain_trump_1.0_100_0.7.wav"))
target song
reference voice
converted voice
# example 2
print("target song\n")
display(Audio("/content/seed-vc/input/ref_song.wav"))
print("reference voice\n")
display(Audio("/content/seed-vc/input/dingzhen.wav"))
print("converted voice\n")
display(Audio("/content/seed-vc/output/vc_ref_song_dingzhen_1.0_100_0.7.wav"))
target song
reference voice
converted voice
# used inputs
#input_list = os.listdir("/content/seed-vc/input")
#for i in input_list:
#  print(i)
#  display(Audio("/content/seed-vc/input/"+i))
# generated output
#output_list = os.listdir("/content/seed-vc/output")
#for i in output_list:
#  print(i)
#  display(Audio("/content/seed-vc/output/"+i))